perm filename AMANUE.NSI[X,ALS] blob
sn#087633 filedate 1974-02-20 generic text, type T, neo UTF8
00100 The Amanuensis Speech Recognition System
00150
00200 by
00300 James L.Hieronymus
00400 Neil J. Miller
00500 Arthur L. Samuel
00600
00700 Stanford A.I. Laboratory, Stanford University
00800
00900 Abstract
01000 The Amanuensis speech recognition system under development at the
01100 Stanford A.I. Laboratory is a front end system that attempts to
01200 extract the maximum amount of linguistic information from the
01300 acoustic speech signal and that uses machine learning techniques. It
01400 differs from the system previously reported in a number of important
01500 respects:
01505 1) Parameters for all voiced regions are determined pitch
01506 synchronously.
01600 2) A new acoustic segmenter is used to extract certain
01700 features directly from the acoustic input and to isolate regions for
01800 special treatment.
02100 3) A new formant extractor is used which obviates the need
02200 for tracking and which can be used with FFT data thus preserving band
02300 width and formant shape information.
02400 4) Use is made of informatiion from both the steady or nearly
02500 regions and transition regions.
02600 5) Speaker normalization is to be done partly by formula and
02700 partly by learning, with a bootstrapping technique proposed to adapt
02800 the system to different speakers.
02900 6) Greater use is made of redundancy of speech to improve the
03000 recognition.
03100 7) An improved form of signature table has been developed for
03200 use by the learning routines, yielding better accuracy and a better
03300 compromise between the need for unnecessaryly large amounts of
03400 training material and the need for smoothing.
03500 8) Several alternate output streams of phonemes are produced
03600 with probability ratings for both the complete streams and for the
03700 individual phonemes, so that it should not be necessary ever to go
03800 back to the original acoustic data to resolve ambiguities and to
03900 incorporate syntactic, semantic and contextual information in the
04000 decision process.